AITopics | state-action pair

We propose and analyze a model-based bootstrap for transition kernels in finite controlled Markov chains (CMCs) with possibly nonstationary or history-dependent control policies, a setting that arises naturally in offline reinforcement learning (RL) when the behavior policy generating the data is unknown. We establish distributional consistency of the bootstrap transition estimator in both a single long-chain regime and the episodic offline RL regime. The key technical tools are a novel bootstrap law of large numbers (LLN) for the visitation counts and a novel use of the martingale central limit theorem (CLT) for the bootstrap transition increments. We extend bootstrap distributional consistency to the downstream targets of offline policy evaluation (OPE) and optimal policy recovery (OPR) via the delta method by verifying Hadamard differentiability of the Bellman operators, yielding asymptotically valid confidence intervals for value and $Q$-functions. Experiments on the RiverSwim problem show that the proposed bootstrap confidence intervals (CIs), especially the percentile CIs, outperform the episodic bootstrap and plug-in CLT CIs, and are often close to nominal ($50\%$, $90\%$, $95\%$) coverage, while the baselines are poorly calibrated at small sample sizes and short episode lengths.

artificial intelligence, bootstrap, machine learning, (16 more...)

arXiv.org Machine Learning

2605.1241

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

Matteo Turchetta, Felix Berkenkamp, Andreas Krause

Neural Information Processing SystemsMay-1-2026, 05:56:07 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.64)

Add feedback

550a141f12de6341fba65b0ad0433500-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 23:24:31 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

54e8912427a8d007ece906c577fdca60-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 23:22:42 GMT

artificial intelligence, machine learning, q-learning, (15 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

248024541dbda1d3fd75fe49d1a4df4d-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 03:47:08 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
(2 more...)

Add feedback

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

Neural Information Processing SystemsApr-24-2026, 22:26:44 GMT

Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to "good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.92)

Genre: Research Report > New Finding (1.00)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

0f65caf0a7d00afd2b87c028e88fe931-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 17:31:47 GMT

data quality, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (0.93)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.93)
Information Technology > Data Science > Data Quality (0.67)

Add feedback

0b9e57c46de934cee33b0e8d1839bfc2-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 15:34:38 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Industry: Leisure & Entertainment > Games > Computer Games (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

000c076c390a4c357313fca29e390ece-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 09:32:12 GMT

machine learning, reinforcement learning, state-action pair, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

ASimple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

Neural Information Processing SystemsApr-24-2026, 05:29:30 GMT

Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art'DIstribution Correction Estimation' (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.

machine learning, natural language, trajectory, (17 more...)

Neural Information Processing Systems

Genre:

Instructional Material (0.46)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Filters

Collaborating Authors

state-action pair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Model-based Bootstrap of Controlled Markov Chains

Safe Exploration in Finite Markov Decision Processes with Gaussian Processes

550a141f12de6341fba65b0ad0433500-Paper.pdf

54e8912427a8d007ece906c577fdca60-Supplemental.pdf

248024541dbda1d3fd75fe49d1a4df4d-Supplemental.pdf

Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets

0f65caf0a7d00afd2b87c028e88fe931-Paper.pdf

0b9e57c46de934cee33b0e8d1839bfc2-Paper.pdf

000c076c390a4c357313fca29e390ece-Paper.pdf

ASimple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories